fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream by Zelys-DFKH · Pull Request #2258 · strands-agents/sdk-python

Zelys-DFKH · 2026-05-06T22:21:32Z

Description

When cancel_node is set in a BeforeNodeCallEvent hook, _execute_node raised RuntimeError after emitting MultiAgentNodeCancelEvent, which propagated through _execute_nodes_parallel and killed the graph. Downstream nodes never ran, the overall status became FAILED, and interrupt-and-resume workflows had no recovery path.

The fix introduces Status.SKIPPED. The bypassed node records result=None and status=Status.SKIPPED, lands in completed_nodes so downstream readiness checks are satisfied, and returns cleanly. Using Status.COMPLETED here would be wrong: a node that never ran is not completed, and conflating the two makes it impossible to distinguish "ran successfully" from "intentionally bypassed" in loop control and observability tools.

Three fixes came alongside:

_build_node_input filters SKIPPED deps before building dependency_results, so when all upstream deps are skipped the downstream node receives the original task with no orphaned "From node_x:" header
_from_dict reads the persisted NodeResult.status to restore Status.SKIPPED correctly on resume, rather than hardcoding Status.COMPLETED for everything in completed_nodes
GraphResult gains skipped_nodes: int; completed_nodes now counts only nodes that actually ran

Also opened #2401 for the complementary CANCELLED semantic (abort-branch, downstream does not run). That issue raises the cancel_node → skip_node naming question; if maintainers prefer Option A, the rename can happen in this PR before it merges.

Related Issues

Resolves #2240

Documentation PR

N/A

Type of Change

Bug fix

Testing

Ran hatch run prepare. Updated test_graph_cancel_node (parametrized ×2) to assert Status.SKIPPED on both NodeResult.status and node.execution_status. Updated test_graph_cancel_node_downstream_executes to assert: downstream node executes (step_b.stream_async.assert_called_once()), step_b receives only the original task with no orphaned header, graph_result.skipped_nodes == 1, and graph_result.completed_nodes == 1.

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Closes strands-agents#2240

yananym · 2026-05-29T17:10:19Z

Thanks for working on this @Zelys-DFKH — this addresses a real pain point. I filed the original issue (#2240) and have been thinking about the design here, especially considering future graph looping support (revisiting previous nodes via edge conditions).

Proposal: Introduce two distinct statuses — `SKIPPED` and `CANCELLED`

After looking at how the TypeScript Strands SDK and LangGraph handle this, I think we should distinguish between two semantics:

1. `SKIPPED` — skip this node, continue downstream (what this PR does)

This is the "skip and continue" semantic. The node is intentionally bypassed, but downstream dependents should still execute. This is what my bug report asks for.

Design:

Introduce Status.SKIPPED enum value (not COMPLETED — a node that never ran shouldn't be marked as completed)
Dependency resolution treats SKIPPED the same as COMPLETED — the node counts as "satisfied" for downstream readiness checks in _find_newly_ready_nodes / _is_node_ready_for_resume
_build_node_input should skip the header entirely for SKIPPED deps (no empty "From node_x:" with nothing underneath)
reset_on_revisit clears SKIPPED nodes (same as COMPLETED) — they become eligible for execution again on loop re-entry
The NodeResult should not store a RuntimeError — a skipped node isn't an error. Use a lightweight marker (e.g., None result or a dedicated sentinel) so get_agent_results() cleanly returns []

The SKIPPED status is valuable because:

It's semantically honest — a node that never ran isn't "completed"
Edge conditions on outgoing edges can inspect execution_status == SKIPPED vs COMPLETED and route differently (powerful for loop control)
Observability — dashboards/logs can distinguish "ran successfully" from "intentionally bypassed"
No ambiguity with nodes that ran but legitimately returned empty output

2. `CANCELLED` — kill this branch gracefully (align with TS SDK)

This is what the TypeScript SDK does with beforeEvent.cancel: the node is cancelled, marked as a terminus, and downstream nodes do NOT execute. No error propagates — the graph can still complete via other branches.

This could be a separate hook field (e.g., cancel_node for cancel, skip_node for skip) or a single field with enum values ("skip" vs "cancel").

Why both are needed (especially for looping)

Consider a loop: A → B → C → (edge back to A if condition)

SKIPPED: On iteration 2, B is skipped because it already gathered data. C still runs, the back-edge fires, and the loop continues. This is the common case for conditional execution within loops.
CANCELLED: You want to abort a branch entirely — e.g., an optional side-branch that's no longer relevant. The main loop path continues via other branches.

Without the distinction, you can't express "skip but continue" vs "stop this path" — both of which are legitimate control flow decisions.

Complementary mechanism: Edge conditions with `invocation_state`

PR #2305 (merged) added invocation_state to edge condition evaluation, which provides an alternative way to route around nodes at the edge level. The two mechanisms are complementary:

Edge conditions: Control whether to traverse a path (node never enters the execution batch)
skip_node / cancel_node hooks: Control whether to execute a node that's already been reached (node is in the batch but gets bypassed)

Both are needed because edge conditions are evaluated before batch formation, while hooks fire after.

Happy to collaborate on the implementation if this direction makes sense. The core of your PR is right — the change from "raise RuntimeError" to "return cleanly" is exactly the fix. The suggestion is to layer proper status semantics on top.

Zelys-DFKH · 2026-05-30T00:00:17Z

Agreed on SKIPPED vs COMPLETED. Updated; your analysis of the header-leak risk in _build_node_input was spot-on, too.

result=None, Status.SKIPPED throughout. _build_node_input filters SKIPPED deps before building dependency_results, so when all upstream deps are skipped the downstream node gets just the original task with no orphaned "From step_a:" header. _from_dict reads the persisted NodeResult.status to restore Status.SKIPPED correctly on resume. Also added skipped_nodes to GraphResult and adjusted completed_nodes to exclude SKIPPED nodes, which closes the observability gap.

SKIPPED nodes live in completed_nodes rather than a separate set. That satisfies both _find_newly_ready_nodes and _compute_ready_nodes_for_resume without touching either function. reset_on_revisit works correctly for loop re-entry, too: SKIPPED nodes are in completed_nodes, so they get cleared and become eligible for execution again on the next iteration.

edge.should_traverse is still evaluated for outgoing edges of SKIPPED nodes, which preserves the division you described: edge conditions control whether to enter a path before batch formation, hooks control whether to execute after.

CANCELLED (abort-branch semantics) is a separate concern; I'll open an issue.

Replace Status.COMPLETED + RuntimeError with Status.SKIPPED + None for nodes bypassed via cancel_node. A skipped node never ran and should not be marked completed; downstream readiness checks treat SKIPPED the same as COMPLETED by keeping skipped nodes in completed_nodes. - Add Status.SKIPPED enum value - NodeResult.result accepts None; to_dict/from_dict round-trip via {"type": "skipped"}; get_agent_results() returns [] for None - _build_node_input filters SKIPPED deps before building dependency_results, so downstream nodes receive only the original task with no orphaned "From node_x:" header - _from_dict restores Status.SKIPPED on node.execution_status via the persisted NodeResult.status rather than blindly setting COMPLETED - GraphResult gains skipped_nodes counter; completed_nodes now counts only nodes that actually ran - reset_on_revisit correctly clears SKIPPED nodes for loop re-entry (they are already in completed_nodes) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Zelys-DFKH · 2026-05-30T09:24:19Z

Opened #2401 to track CANCELLED (abort-branch semantics). It also surfaces the cancel_node → skip_node naming question — happy to make that rename in this PR if maintainers prefer Option A before it merges.

fix(graph): treat cancel_node as control flow, not fatal error

fddde2e

Closes strands-agents#2240

Zelys-DFKH requested a deployment to manual-approval May 6, 2026 22:21 — with GitHub Actions Waiting

github-actions Bot added the size/s label May 6, 2026

Zelys-DFKH requested a deployment to manual-approval May 6, 2026 22:21 — with GitHub Actions Waiting

Zelys-DFKH mentioned this pull request May 8, 2026

[BUG] cancel_node in BeforeNodeCallEvent raises RuntimeError that kills the entire graph on resume #2240

Open

3 tasks

yonib05 added area-multiagent Multi-agent related area-hooks Features or requests that might be implementable via hooks labels May 27, 2026

yonib05 added python Pull requests that update python code bug Something isn't working labels May 29, 2026

yananym mentioned this pull request May 29, 2026

[FEATURE] Support graph cycles (looping back to previous nodes via edge conditions) #2387

Open

github-actions Bot added size/m and removed size/s labels May 30, 2026

Zelys-DFKH requested a deployment to manual-approval May 30, 2026 00:01 — with GitHub Actions Waiting

Zelys-DFKH mentioned this pull request May 30, 2026

[FEATURE] Status.CANCELLED: abort a branch without running downstream or propagating errors #2401

Open

Zelys-DFKH changed the title ~~fix(graph): treat cancel_node as control flow, not fatal error~~ fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream#2258

fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream#2258
Zelys-DFKH wants to merge 2 commits into
strands-agents:mainfrom
Zelys-DFKH:fix/graph-cancel-node-graceful-skip

Zelys-DFKH commented May 6, 2026 •

edited

Loading

Uh oh!

yananym commented May 29, 2026

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Zelys-DFKH commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

yananym commented May 29, 2026

Proposal: Introduce two distinct statuses — SKIPPED and CANCELLED

1. SKIPPED — skip this node, continue downstream (what this PR does)

2. CANCELLED — kill this branch gracefully (align with TS SDK)

Why both are needed (especially for looping)

Complementary mechanism: Edge conditions with invocation_state

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zelys-DFKH commented May 6, 2026 •

edited

Loading

Proposal: Introduce two distinct statuses — `SKIPPED` and `CANCELLED`

1. `SKIPPED` — skip this node, continue downstream (what this PR does)

2. `CANCELLED` — kill this branch gracefully (align with TS SDK)

Complementary mechanism: Edge conditions with `invocation_state`